Evaluation of Different Approaches to Training a Genre Classifier
نویسندگان
چکیده
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected based on the literature and the observation of the corpus. Thirdly, these features were extracted from the corpus to obtain a data set. Finally, three machine learning algorithms, one for induction of decision trees (J48) and two ensemble algorithms (bagging and boosting), were trained and tested on the data set. Additionally, impact of feature selection on ensemble algorithms was tested. The best performed genre classifiers in terms of precision were selected to obtain the best of set of classifiers. On average the best of set achieved 9% better precision, but slightly worse recall. Accuracy and F-measure did not vary significantly. The results indicate that classification by genre could be a useful addition to search engines.
منابع مشابه
Cross-genre training for automatic prosody classification
We consider methods for training a prosodic classifier using labeled training data from a different genre than the one on which the system will be deployed. Two binary tasks are considered: word-level pitch accent and phrase boundary detection. Using radio news and conversational telephone speech, we consider cross-genre training using acoustic and textual features, and find that acoustic featu...
متن کاملAutomatic music genre classification using second-order statistical measures for the prescriptive approach
Several works proposed for the automatic genre musical classification are based on various combinations of parameters, exploiting different models. However, the comparison of all previous works remain impossible since they used different target taxonomies, genre definitions and databases. In this paper, the world largest music database (Real World Computing) is used. Also, different measures re...
متن کاملماشین بینایی تشخیصگر باروری تخممرغ و ارزیابی کارایی شبکههای عصبی و ماشین بردار پشتیبان در آن
In this research, a system is proposed for detecting fertility of eggs. The system is composed of two parts: hardware and software. The fabricated hardware provides a platform to obtain accurate images from inner side of the eggs, without harming their embryos. The software part includes a set of image processing and machine vision processes, which is able to detect the fertility of eggs from c...
متن کاملComparison of the Performance of Genre Classifiers Trained by Different Machine Learning Algorithms
Modern search engines aim at classifying web pages not only according to topics, but also according to genres. This paper presents the results of an attempt to train a genre classifier. We present features extracted from a 20-genre corpus used for training the genre classifiers and the results of using different machine learning (ML) algorithms in the process of learning. Success of the genre c...
متن کاملEvaluation of Jamendo Database as Training Set for Automatic Genre Recognition
Research on automatic music classification has gained significance in the recent years due to a significant increase in music collections size. Music is available very easily through the mobile and internet domain, so there is a need to manage music by categorizing it for search and discovery. This paper focuses on music classification by genre which is a type of supervised learning oriented pr...
متن کامل